Floresta Sintá(c)tica: Bigger, Thicker and Easier
نویسندگان
چکیده
In this paper, we describe the resumption of activities of Floresta Sintá(c)tica, a treebank for Portuguese. We present some underlying guidelines around the project and how they influence our linguistic choices. We then describe the new texts added to the treebank, proceed to mention the new syntactic information added to the old texts, and finally describe the new user-friendly search system and the plans for its expansion.
منابع مشابه
Floresta Sintá(c)tica: A treebank for Portuguese
This paper reviews the first year of the creation of a publicly available treebank for Portuguese, Floresta Sintá(c)tica, a collaboration project between the VISL and the Computational Processing of Portuguese projects. After briefly describing the main goals and the organization of the project, the creation of the annotated objects is presented in detail: preparing the text to be annotated, ap...
متن کاملAutomatic Semantic-Role Annotation for Portuguese
The paper presents and evaluates a parsing system for the automatic annotation of Porguguese text with semantic role tags. All in all, 38 different categories, like agent, patient, location etc. are distinguished. The annotater uses a grammar of 500 hand-written Constraint Grammar rules and exploits syntactic dependency links as well as semantic prototype classes and syntactic function. The int...
متن کاملAdaptation of Data and Models for Probabilistic Parsing of Portuguese
We present the first results for recovering word-word dependencies from a probabilistic parser for Portuguese trained on and evaluated against human annotated syntactic analyses. We use the Floresta Sintá(c)tica with the Bikel multi-lingual parsing engine and evaluate performance on both PARSEVAL and unlabeled dependencies. We explore several configurations, both in terms of parameterizing the ...
متن کاملConstraint Grammar-based conversion of Dependency Treebanks
This paper presents a new method for the conversion of one style of dependency treebanks into another, using contextual, Constraint Grammar-based transformation rules for both structural changes (attachment) and changes in syntacticfunctional tags (edge labels). In particular, we address the conversion of traditional syntactic dependency annotation into the semantically motivated dependency ann...
متن کاملPart-of-Speech Tagging of Portuguese Using Hidden Markov Models with Character Language Model Emissions
This paper presents a probabilistic approach for POS tagging that combines HMMs and character language models being applied to Portuguese texts. In this approach, the emission probabilities for each hidden state in a HMM are estimated by a proper character language model. The tagger built has been trained and tested on Bosque, a subset of Floresta Sintá(c)tica treebank, reaching 96.2% accuracy ...
متن کامل